301 research outputs found
Integrative Windowing
In this paper we re-investigate windowing for rule learning algorithms. We
show that, contrary to previous results for decision tree learning, windowing
can in fact achieve significant run-time gains in noise-free domains and
explain the different behavior of rule learning algorithms by the fact that
they learn each rule independently. The main contribution of this paper is
integrative windowing, a new type of algorithm that further exploits this
property by integrating good rules into the final theory right after they have
been discovered. Thus it avoids re-learning these rules in subsequent
iterations of the windowing process. Experimental evidence in a variety of
noise-free domains shows that integrative windowing can in fact achieve
substantial run-time gains. Furthermore, we discuss the problem of noise in
windowing and present an algorithm that is able to achieve run-time gains in a
set of experiments in a simple domain with artificial noise.Comment: See http://www.jair.org/ for any accompanying file
Optimal investment and location decisions of a firm in a flood risk area using impulse control theory
Flooding events can affect businesses close to rivers, lakes or coasts. This paper provides an economic partial equilibrium model, which helps to understand the optimal location choice for a firm in flood risk areas and its investment strategies. How often, when and how much are firms willing to invest in flood risk protection measures? We apply Impulse Control Theory and develop a continuation algorithm to solve the model numerically. We find that, the higher the flood risk and the more the firm values the future, i.e. the more sustainable the firm plans, the more the firm will invest in flood defense. Investments in productive capital follow a similar path. Hence, planning in a sustainable way leads to economic growth. Sociohydrological feedbacks are crucial for the location choice of the firm, whereas different economic settings have an impact on investment strategies. If flood defense is already present, e.g. built up by the government, firms move closer to the water and invest less in flood defense, which allows firms to generate higher expected profits. Firms with a large initial productive capital surprisingly try not to keep their market advantage, but rather reduce flood risk by reducing exposed productive capital
Migration on request, a practical technique for preservation
Maintaining a digital object in a usable state over time is a crucial aspect of digital preservation. Existing methods of preserving have many drawbacks. This paper describes advanced techniques of data migration which can be used to support preservation more accurately and cost effectively.
To ensure that preserved works can be rendered on current computer systems over time, “traditional migration” has been used to convert data into current formats. As the new format becomes obsolete another conversion is performed, etcetera. Traditional migration has many inherent problems as errors during transformation propagate throughout future transformations.
CAMiLEON’s software longevity principles can be applied to a migration strategy, offering improvements over traditional migration. This new approach is named “Migration on Request.” Migration on Request shifts the burden of preservation onto a single tool, which is maintained over time. Always returning to the original format enables potential errors to be significantly reduced
Factorizing LambdaMART for cold start recommendations
Recommendation systems often rely on point-wise loss metrics such as the mean
squared error. However, in real recommendation settings only few items are
presented to a user. This observation has recently encouraged the use of
rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to
rank which relies on such a metric. Despite its success it does not have a
principled regularization mechanism relying in empirical approaches to control
model complexity leaving it thus prone to overfitting.
Motivated by the fact that very often the users' and items' descriptions as
well as the preference behavior can be well summarized by a small number of
hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization
(LambdaMART-MF), that learns a low rank latent representation of users and
items using gradient boosted trees. The algorithm factorizes lambdaMART by
defining relevance scores as the inner product of the learned representations
of the users and items. The low rank is essentially a model complexity
controller; on top of it we propose additional regularizers to constraint the
learned latent representations that reflect the user and item manifolds as
these are defined by their original feature based descriptors and the
preference behavior. Finally we also propose to use a weighted variant of NDCG
to reduce the penalty for similar items with large rating discrepancy.
We experiment on two very different recommendation datasets, meta-mining and
movies-users, and evaluate the performance of LambdaMART-MF, with and without
regularization, in the cold start setting as well as in the simpler matrix
completion setting. In both cases it outperforms in a significant manner
current state of the art algorithms
Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation
Biomedical text tagging systems are plagued by the dearth of labeled training
data. There have been recent attempts at using pre-trained encoders to deal
with this issue. Pre-trained encoder provides representation of the input text
which is then fed to task-specific layers for classification. The entire
network is fine-tuned on the labeled data from the target task. Unfortunately,
a low-resource biomedical task often has too few labeled instances for
satisfactory fine-tuning. Also, if the label space is large, it contains few or
no labeled instances for majority of the labels. Most biomedical tagging
systems treat labels as indexes, ignoring the fact that these labels are often
concepts expressed in natural language e.g. `Appearance of lesion on brain
imaging'. To address these issues, we propose constructing extra labeled
instances using label-text (i.e. label's name) as input for the corresponding
label-index (i.e. label's index). In fact, we propose a number of strategies
for manufacturing multiple artificial labeled instances from a single label.
The network is then fine-tuned on a combination of real and these newly
constructed artificial labeled instances. We evaluate the proposed approach on
an important low-resource biomedical task called \textit{PICO annotation},
which requires tagging raw text describing clinical trials with labels
corresponding to different aspects of the trial i.e. PICO (Population,
Intervention/Control, Outcome) characteristics of the trial. Our empirical
results show that the proposed method achieves a new state-of-the-art
performance for PICO annotation with very significant improvements over
competitive baselines.Comment: International Workshop on Health Intelligence (W3PHIAI-20); AAAI-2
Aiding first incident responders using a decision support system based on live drone feeds
In case of a dangerous incident, such as a fire, a collision or an earthquake, a lot of contextual data is available for the first incident responders when handling this incident. Based on this data, a commander on scene or dispatchers need to make split-second decisions to get a good overview on the situation and to avoid further injuries or risks. Therefore, we propose a decision support system that can aid incident responders on scene in prioritizing the rescue efforts that need to be addressed. The system collects relevant data from a custom designed drone by detecting objects such as firefighters, fires, victims, fuel tanks, etc. The drone autonomously observes the incident area, and based on the detected information it proposes a prioritized based action list on e.g. urgency or danger to incident responders
Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability
Post-hoc model-agnostic interpretation methods such as partial dependence
plots can be employed to interpret complex machine learning models. While these
interpretation methods can be applied regardless of model complexity, they can
produce misleading and verbose results if the model is too complex, especially
w.r.t. feature interactions. To quantify the complexity of arbitrary machine
learning models, we propose model-agnostic complexity measures based on
functional decomposition: number of features used, interaction strength and
main effect complexity. We show that post-hoc interpretation of models that
minimize the three measures is more reliable and compact. Furthermore, we
demonstrate the application of these measures in a multi-objective optimization
approach which simultaneously minimizes loss and complexity
Modelling fish habitat preference with a genetic algorithm-optimized Takagi-Sugeno model based on pairwise comparisons
Species-environment relationships are used for evaluating the current status of target species and the potential impact of natural or anthropogenic changes of their habitat. Recent researches reported that the results are strongly affected by the quality of a data set used. The present study attempted to apply pairwise comparisons to modelling fish habitat preference with Takagi-Sugeno-type fuzzy habitat preference models (FHPMs) optimized by a genetic algorithm (GA). The model was compared with the result obtained from the FHPM optimized based on mean squared error (MSE). Three independent data sets were used for training and testing of these models. The FHPMs based on pairwise comparison produced variable habitat preference curves from 20 different initial conditions in the GA. This could be partially ascribed to the optimization process and the regulations assigned. This case study demonstrates applicability and limitations of pairwise comparison-based optimization in an FHPM. Future research should focus on a more flexible learning process to make a good use of the advantages of pairwise comparisons
Impact of tumor size and tracer uptake heterogeneity in (18)F-FDG PET and CT non-small cell lung cancer tumor delineation.: 18F-FDG PET and CT tumor delineation in NSCLC
International audienceUNLABELLED: The objectives of this study were to investigate the relationship between CT- and (18)F-FDG PET-based tumor volumes in non-small cell lung cancer (NSCLC) and the impact of tumor size and uptake heterogeneity on various approaches to delineating uptake on PET images. METHODS: Twenty-five NSCLC cancer patients with (18)F-FDG PET/CT were considered. Seventeen underwent surgical resection of their tumor, and the maximum diameter was measured. Two observers manually delineated the tumors on the CT images and the tumor uptake on the corresponding PET images, using a fixed threshold at 50% of the maximum (T(50)), an adaptive threshold methodology, and the fuzzy locally adaptive Bayesian (FLAB) algorithm. Maximum diameters of the delineated volumes were compared with the histopathology reference when available. The volumes of the tumors were compared, and correlations between the anatomic volume and PET uptake heterogeneity and the differences between delineations were investigated. RESULTS: All maximum diameters measured on PET and CT images significantly correlated with the histopathology reference (r > 0.89, P < 0.0001). Significant differences were observed among the approaches: CT delineation resulted in large overestimation (+32% ± 37%), whereas all delineations on PET images resulted in underestimation (from -15% ± 17% for T(50) to -4% ± 8% for FLAB) except manual delineation (+8% ± 17%). Overall, CT volumes were significantly larger than PET volumes (55 ± 74 cm(3) for CT vs. from 18 ± 25 to 47 ± 76 cm(3) for PET). A significant correlation was found between anatomic tumor size and heterogeneity (larger lesions were more heterogeneous). Finally, the more heterogeneous the tumor uptake, the larger was the underestimation of PET volumes by threshold-based techniques. CONCLUSION: Volumes based on CT images were larger than those based on PET images. Tumor size and tracer uptake heterogeneity have an impact on threshold-based methods, which should not be used for the delineation of cases of large heterogeneous NSCLC, as these methods tend to largely underestimate the spatial extent of the functional tumor in such cases. For an accurate delineation of PET volumes in NSCLC, advanced image segmentation algorithms able to deal with tracer uptake heterogeneity should be preferred
Multi-score Learning for Affect Recognition: the Case of Body Postures
An important challenge in building automatic affective state
recognition systems is establishing the ground truth. When the groundtruth
is not available, observers are often used to label training and testing
sets. Unfortunately, inter-rater reliability between observers tends to
vary from fair to moderate when dealing with naturalistic expressions.
Nevertheless, the most common approach used is to label each expression
with the most frequent label assigned by the observers to that expression.
In this paper, we propose a general pattern recognition framework
that takes into account the variability between observers for automatic
affect recognition. This leads to what we term a multi-score learning
problem in which a single expression is associated with multiple values
representing the scores of each available emotion label. We also propose
several performance measurements and pattern recognition methods for
this framework, and report the experimental results obtained when testing
and comparing these methods on two affective posture datasets
- …